Overview

Dataset statistics

Number of variables11
Number of observations337336
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.3 MiB
Average record size in memory88.0 B

Variable types

NUM11

Reproduction

Analysis started2020-07-16 22:16:24.926881
Analysis finished2020-07-16 22:17:40.512863
Duration1 minute and 15.59 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Variables

year
Real number (ℝ≥0)

Distinct count29
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2001.977512035478
Minimum1988
Maximum2016
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1988
5-th percentile1989
Q11995
median2002
Q32009
95-th percentile2015
Maximum2016
Range28
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.354764043
Coefficient of variation (CV)0.00417325569
Kurtosis-1.201571662
Mean2001.977512
Median Absolute Deviation (MAD)7
Skewness0.001011198733
Sum675339086
Variance69.80208222
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2003116533.5%
 
2006116533.5%
 
1994116533.5%
 
1997116533.5%
 
1998116533.5%
 
1999116533.5%
 
2000116533.5%
 
2004116533.5%
 
2005116533.5%
 
1991116533.5%
 
Other values (19)22080665.5%
 
ValueCountFrequency (%) 
1988116423.5%
 
1989116513.5%
 
1990116513.5%
 
1991116533.5%
 
1992116523.5%
 
ValueCountFrequency (%) 
2016113643.4%
 
2015113643.4%
 
2014116533.5%
 
2013116533.5%
 
2012116513.5%
 

zipcode
Real number (ℝ≥0)

Distinct count11653
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45406.37917091564
Minimum1001
Maximum99901
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1001
5-th percentile6040
Q120109
median40118
Q371909
95-th percentile95693
Maximum99901
Range98900
Interquartile range (IQR)51800

Descriptive statistics

Standard deviation29263.42758
Coefficient of variation (CV)0.6444783335
Kurtosis-1.135355825
Mean45406.37917
Median Absolute Deviation (MAD)24359
Skewness0.3063681193
Sum1.531720632e+10
Variance856348193.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2865829< 0.1%
 
9740129< 0.1%
 
3183329< 0.1%
 
7870429< 0.1%
 
290329< 0.1%
 
3360629< 0.1%
 
2108429< 0.1%
 
3132129< 0.1%
 
8024129< 0.1%
 
9866229< 0.1%
 
Other values (11643)33704699.9%
 
ValueCountFrequency (%) 
100129< 0.1%
 
100229< 0.1%
 
100529< 0.1%
 
100729< 0.1%
 
101029< 0.1%
 
ValueCountFrequency (%) 
9990129< 0.1%
 
9980129< 0.1%
 
9970929< 0.1%
 
9970129< 0.1%
 
9957729< 0.1%
 

EQI_zip
Real number (ℝ≥0)

Distinct count330219
Unique (%)97.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.000522528831953512
Minimum1.1998789e-05
Maximum0.0641894
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1.1998789e-05
5-th percentile0.000115540135
Q10.0002105545275
median0.0003217026
Q30.0005152216375
95-th percentile0.001376939875
Maximum0.0641894
Range0.06417740121
Interquartile range (IQR)0.00030466711

Descriptive statistics

Standard deviation0.0010000942
Coefficient of variation (CV)1.913950272
Kurtosis491.5615115
Mean0.000522528832
Median Absolute Deviation (MAD)0.000133222525
Skewness15.92369211
Sum176.2677861
Variance1.000188409e-06
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000491000458< 0.1%
 
0.001192661647< 0.1%
 
0.0002308663846< 0.1%
 
0.000305521344< 0.1%
 
0.001638096142< 0.1%
 
0.000664835941< 0.1%
 
0.0002203975239< 0.1%
 
0.000649719836< 0.1%
 
0.0007314810632< 0.1%
 
0.000187109831< 0.1%
 
Other values (330209)33692099.9%
 
ValueCountFrequency (%) 
1.1998789e-051< 0.1%
 
1.3727406e-051< 0.1%
 
1.4215268e-051< 0.1%
 
1.4599402e-051< 0.1%
 
1.4817676e-051< 0.1%
 
ValueCountFrequency (%) 
0.06418941< 0.1%
 
0.0577951371< 0.1%
 
0.056522611< 0.1%
 
0.0551708641< 0.1%
 
0.054629111< 0.1%
 

SFR_zip
Real number (ℝ≥0)

Distinct count1537
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.82791045130078
Minimum1.0
Maximum6883.0
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile4
Q114
median38
Q399
95-th percentile318
Maximum6883
Range6882
Interquartile range (IQR)85

Descriptive statistics

Standard deviation137.5204495
Coefficient of variation (CV)1.621169834
Kurtosis81.07312521
Mean84.82791045
Median Absolute Deviation (MAD)29
Skewness5.741122524
Sum28615508
Variance18911.87402
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
573242.2%
 
672782.2%
 
772112.1%
 
870942.1%
 
469462.1%
 
967812.0%
 
1066492.0%
 
364901.9%
 
1161691.8%
 
1259061.8%
 
Other values (1527)26948879.9%
 
ValueCountFrequency (%) 
136011.1%
 
254071.6%
 
364901.9%
 
469462.1%
 
573242.2%
 
ValueCountFrequency (%) 
68831< 0.1%
 
58581< 0.1%
 
43071< 0.1%
 
42011< 0.1%
 
41031< 0.1%
 

RECPI_zip
Real number (ℝ≥0)

Distinct count331709
Unique (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.043653849650231946
Minimum1.1998789e-05
Maximum9.541773
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1.1998789e-05
5-th percentile0.000963155455
Q10.00415424315
median0.013124878
Q30.03799319525
95-th percentile0.16244798
Maximum9.541773
Range9.541761001
Interquartile range (IQR)0.0338389521

Descriptive statistics

Standard deviation0.1377787814
Coefficient of variation (CV)3.156165665
Kurtosis617.5952807
Mean0.04365384965
Median Absolute Deviation (MAD)0.0108302418
Skewness18.25116806
Sum14726.01503
Variance0.0189829926
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000491000442< 0.1%
 
0.000305521335< 0.1%
 
0.001192661635< 0.1%
 
0.0002308663835< 0.1%
 
0.001638096132< 0.1%
 
0.000664835932< 0.1%
 
0.0002203975231< 0.1%
 
0.000649719823< 0.1%
 
0.000560993223< 0.1%
 
0.000187109823< 0.1%
 
Other values (331699)33702599.9%
 
ValueCountFrequency (%) 
1.1998789e-051< 0.1%
 
1.3727406e-051< 0.1%
 
1.4817676e-051< 0.1%
 
1.5237238e-051< 0.1%
 
1.7317287e-053< 0.1%
 
ValueCountFrequency (%) 
9.5417731< 0.1%
 
9.1737291< 0.1%
 
8.4193931< 0.1%
 
7.4585361< 0.1%
 
7.41820961< 0.1%
 

EQI_MSA
Real number (ℝ≥0)

Distinct count22829
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0005629322789344037
Minimum2.37131e-05
Maximum0.015639344
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum2.37131e-05
5-th percentile0.00014410826
Q10.00025185168
median0.00038444618
Q30.00058807
95-th percentile0.0016214617
Maximum0.015639344
Range0.0156156309
Interquartile range (IQR)0.00033621832

Descriptive statistics

Standard deviation0.0006875578172
Coefficient of variation (CV)1.221386378
Kurtosis55.81173025
Mean0.0005629322789
Median Absolute Deviation (MAD)0.00015124948
Skewness5.788314398
Sum189.8973232
Variance4.72735752e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000473406438630.3%
 
0.00023618428630.3%
 
0.000455087758630.3%
 
0.000446333038630.3%
 
0.00025697148630.3%
 
0.00038580688630.3%
 
0.000379161078630.3%
 
0.000464955088630.3%
 
0.000215293878630.3%
 
0.000412599248630.3%
 
Other values (22819)32870697.4%
 
ValueCountFrequency (%) 
2.37131e-052< 0.1%
 
2.7167029e-051< 0.1%
 
2.9056831e-056< 0.1%
 
3.2550477e-051< 0.1%
 
3.9400034e-052< 0.1%
 
ValueCountFrequency (%) 
0.01563934413< 0.1%
 
0.0124933651< 0.1%
 
0.01031073652590.1%
 
0.00956386051< 0.1%
 
0.008045252< 0.1%
 

SFR_MSA
Real number (ℝ≥0)

Distinct count3540
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9500.565492565276
Minimum1.0
Maximum153589.0
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile62
Q1366
median1718
Q39603
95-th percentile50318
Maximum153589
Range153588
Interquartile range (IQR)9237

Descriptive statistics

Standard deviation17572.36921
Coefficient of variation (CV)1.849612976
Kurtosis14.45310417
Mean9500.565493
Median Absolute Deviation (MAD)1607
Skewness3.262329342
Sum3204882761
Variance308788159.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
130610.9%
 
5031817260.5%
 
586308630.3%
 
207418630.3%
 
227778630.3%
 
417738630.3%
 
530338630.3%
 
317688630.3%
 
633098630.3%
 
311128630.3%
 
Other values (3530)32564596.5%
 
ValueCountFrequency (%) 
130610.9%
 
21960.1%
 
3119< 0.1%
 
4144< 0.1%
 
5162< 0.1%
 
ValueCountFrequency (%) 
1535891830.1%
 
1487591830.1%
 
1405211830.1%
 
1320361830.1%
 
1261381830.1%
 

RECPI_MSA
Real number (ℝ≥0)

Distinct count22837
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.145562019091911
Minimum2.37131e-05
Maximum132.1134
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum2.37131e-05
5-th percentile0.017198302
Q10.11930856
median0.6555024
Q35.2990756
95-th percentile23.799837
Maximum132.1134
Range132.1133763
Interquartile range (IQR)5.17976704

Descriptive statistics

Standard deviation10.29445346
Coefficient of variation (CV)2.000647046
Kurtosis30.989684
Mean5.145562019
Median Absolute Deviation (MAD)0.625868068
Skewness4.330708467
Sum1735783.309
Variance105.9757719
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
15.1101218630.3%
 
13.043858630.3%
 
13.1221368630.3%
 
13.6868268630.3%
 
13.6854398630.3%
 
14.2542838630.3%
 
13.6046328630.3%
 
16.7275058630.3%
 
10.1661278630.3%
 
12.6508548630.3%
 
Other values (22827)32870697.4%
 
ValueCountFrequency (%) 
2.37131e-052< 0.1%
 
2.9056831e-056< 0.1%
 
3.2550477e-051< 0.1%
 
4.0748477e-056< 0.1%
 
4.1634128e-0522< 0.1%
 
ValueCountFrequency (%) 
132.1134145< 0.1%
 
117.37917145< 0.1%
 
114.993835145< 0.1%
 
98.44064145< 0.1%
 
84.49912145< 0.1%
 

EQI_state
Real number (ℝ≥0)

Distinct count1444
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0006086236619870218
Minimum7.426358e-05
Maximum0.0037453347
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum7.426358e-05
5-th percentile0.00016987788
Q10.00030454743
median0.0004428451
Q30.0006381583
95-th percentile0.0018966781
Maximum0.0037453347
Range0.00367107112
Interquartile range (IQR)0.00033361087

Descriptive statistics

Standard deviation0.0005196898527
Coefficient of variation (CV)0.8538771743
Kurtosis5.144266833
Mean0.000608623662
Median Absolute Deviation (MAD)0.00015669737
Skewness2.226484269
Sum205.3106716
Variance2.70077543e-07
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.000429312329180.3%
 
0.000308600289180.3%
 
0.000311852089180.3%
 
0.0003186389180.3%
 
0.000461858149180.3%
 
0.0004593649180.3%
 
0.000455831859180.3%
 
0.000432910169180.3%
 
0.000305892359180.3%
 
0.00046384839180.3%
 
Other values (1434)32815697.3%
 
ValueCountFrequency (%) 
7.426358e-05103< 0.1%
 
7.8000994e-05103< 0.1%
 
7.82489e-05103< 0.1%
 
8.127372e-05103< 0.1%
 
8.188262e-0511< 0.1%
 
ValueCountFrequency (%) 
0.00374533474080.1%
 
0.00302915974080.1%
 
0.0029479224080.1%
 
0.00278419197070.2%
 
0.0026056187070.2%
 

SFR_state
Real number (ℝ≥0)

Distinct count1427
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53603.12074904546
Minimum40.0
Maximum330536.0
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum40
5-th percentile4881
Q116805
median35247
Q368137
95-th percentile168977
Maximum330536
Range330496
Interquartile range (IQR)51332

Descriptive statistics

Standard deviation56886.5504
Coefficient of variation (CV)1.061254449
Kurtosis5.398992322
Mean53603.12075
Median Absolute Deviation (MAD)22169
Skewness2.177968699
Sum1.808226234e+10
Variance3236079616
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11362316490.5%
 
1867510850.3%
 
836099180.3%
 
809869180.3%
 
1116159180.3%
 
579669180.3%
 
1358079180.3%
 
845569180.3%
 
602899180.3%
 
1344039180.3%
 
Other values (1417)32725897.0%
 
ValueCountFrequency (%) 
401< 0.1%
 
631< 0.1%
 
651< 0.1%
 
711< 0.1%
 
941< 0.1%
 
ValueCountFrequency (%) 
3305368000.2%
 
3139608000.2%
 
2957168000.2%
 
2875918000.2%
 
2768898000.2%
 

RECPI_state
Real number (ℝ≥0)

Distinct count1444
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.09501161652317
Minimum0.01783298
Maximum442.21994000000007
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum0.01783298
5-th percentile1.5838081
Q16.9673257
median16.268452
Q335.56701
95-th percentile93.418846
Maximum442.21994
Range442.202107
Interquartile range (IQR)28.5996843

Descriptive statistics

Standard deviation57.85914507
Coefficient of variation (CV)1.696997371
Kurtosis19.34143997
Mean34.09501162
Median Absolute Deviation (MAD)11.1700247
Skewness4.109350654
Sum11501474.84
Variance3347.680669
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34.2667129180.3%
 
34.650519180.3%
 
38.580289180.3%
 
43.559959180.3%
 
27.1613859180.3%
 
39.8536159180.3%
 
37.0704279180.3%
 
30.7075239180.3%
 
32.961789180.3%
 
32.9276549180.3%
 
Other values (1434)32815697.3%
 
ValueCountFrequency (%) 
0.0178329811< 0.1%
 
0.0276760614< 0.1%
 
0.03167600611< 0.1%
 
0.0324715911< 0.1%
 
0.03350402811< 0.1%
 
ValueCountFrequency (%) 
442.219947070.2%
 
424.644327070.2%
 
397.99587070.2%
 
345.096077070.2%
 
308.30757070.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
0198810010.00081548.00.0391080.0010211235.01.2608880.00147617558.025.92194
1198810020.00176435.00.0617510.0010211235.01.2608880.00147617558.025.92194
2198810050.0007528.00.0060150.000428104.00.0445310.00147617558.025.92194
3198810070.00077618.00.0139760.0010211235.01.2608880.00147617558.025.92194
4198810100.0009247.00.0064710.0010211235.01.2608880.00147617558.025.92194
5198810130.00084922.00.0186710.0010211235.01.2608880.00147617558.025.92194
6198810200.00080546.00.0370150.0010211235.01.2608880.00147617558.025.92194
7198810270.00338910.00.0338880.0010211235.01.2608880.00147617558.025.92194
8198810280.00095730.00.0287110.0010211235.01.2608880.00147617558.025.92194
9198810300.00086835.00.0303920.0010211235.01.2608880.00147617558.025.92194

Last rows

yearzipcodeEQI_zipSFR_zipRECPI_zipEQI_MSASFR_MSARECPI_MSAEQI_stateSFR_stateRECPI_state
3373262016995030.000080624.00.0498580.0000831993.00.1659810.0000823847.00.315002
3373272016995070.000080109.00.0087320.0000831993.00.1659810.0000823847.00.315002
3373282016995080.00008685.00.0072830.0000831993.00.1659810.0000823847.00.315002
3373292016995150.000081102.00.0083130.0000831993.00.1659810.0000823847.00.315002
3373302016995180.00013363.00.0083560.0000831993.00.1659810.0000823847.00.315002
3373312016995770.00008388.00.0072720.0000831993.00.1659810.0000823847.00.315002
3373322016997010.00008862.00.0054540.000078280.00.0217490.0000823847.00.315002
3373332016997090.00007573.00.0054410.000078280.00.0217490.0000823847.00.315002
3373342016998010.00006470.00.0044920.00006678.00.0051280.0000823847.00.315002
3373352016999010.00007647.00.0035710.00007354.00.0039290.0000823847.00.315002